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TITLE OF THE INVENTION 

Multi-Stream FFT for MIMO-OFDM systems 
FIELD OF THE INVENTION 

The present invention relates to a processor and method for 
subjecting multiple parallel input data streams to Fast 
Fourier Transformation, FFT. 

BACKGROUND OF THE INVENTION 

By using Fast Fourier Transformation, the Discrete Fourier 
Transform can be obtained. This is important in many signal 
processing scenarios. 

In particular in, for example, mobile communication 
scenarios, the FFT is required to be obtained for various 
purposes. Conventionally, in case a single data stream is 
to be subjected to FFT transformation, various scenarios 
for accomplishing this are known. A single data stream is 
often referred to as SISO, ^'Single Input Single Output". As 
a typical SISO scenario, one might consider a case in which 
a communication network entity such as a base station or 
Node__B transmits via a single antenna or antenna element 
data to a mobile station or user equipment with one antenna 
element (or vice versa). 
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On the other hand, with further developments in 
communication technology, scenarios are implemented and 
under investigation which apply multiple antenna elements 
for transmission and for reception. In such cases, a so- 
called ^^Multiple Input Multiple Output", MIMO, concept is 
present. MIMO concepts are often applied in connection with 
Orthogonal Frequency Division Multiplex, OFDM, systems. 

MIMO-OFDM (multiple-input-multiple-output orthogonal 
frequency division multiplex) systems offer remarkable 
increase in link reliability and/or in data rate. However, 
this new technique suffers on higher complexity of the 
hardware. For this reason, there is a need of clever 
strategies to reduce the expenditure of hardware. 

Apparently, with multiple input data streams being present 
simultaneously, i.e. in parallel, also those multiple data 
streams have to be subjected to FFT. This imposes a certain 
problem in terms of processing load, processing speed, 
and/or complexity for the signal processing methods and 
hardware used for this purpose. 

The FFT transformation is a central process in conventional 
OFDM (SISO-OFDM: single-input-single-output OFDM) systems. 
The transition to MIMO technique results in an OFDM system 
with several FFT transformation processes in parallel. For 
instance, MIMO systems with four receiver antenna elements 
need four FFT transformations. In straightforward 
solutions, there have to be installed four FFT processing 
blocks. This leads to much higher hardware complexity. 
Hence, there is a need for a new implementation strategy of 
the FFT for MIMO systems. 

He and Tprkelson have presented ^'A new approach to . Pipeline 
FFT processor" in IEEE Proceeedings of IPPS '96, 1996, pp. 
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766 to 770. This document introduces various pipeline FFT 
processors for SISO scenarios. 

For better understanding of the present invention to be 
5 described hereinafter^ a. brief review and introduction of 
the FFT pipeline architecture as presented by He and 
Torkelson is given hereinafter. A particular usable FFT is 
briefly introduced to obtain an idea of the main structure 
and its properties. 

10 

To this end, the SISO Radix 2^ single-path delay feedback 
(SDF) architecture proposed by He & Torkelson will be 
considered. This architecture is also referred to as 
R2^SDF. 

15 

FFT for SISO Systems according to He & Torkelson 

As mentioned, a structure of the FFT algorithm was 
proposed, where a Radix 2^ single-path delay feedback (SDF) 
20 architecture is used. Because of the SDF, the spatial 
regularity of the resulting architecture / signal flow 
graph could be exploited- The resulting hardware 
requirement is minimal on both dominant components: complex 
multipliers and complex data memory. 

25 

For a hardware-oriented implementation, this approach 
combines the advantage of the signal flow graph, SFG, of 
. radix 4 and radix 2 approaches. The SFG radix 4 requires 
minimum of non-trivial multipliers, whereas the SFG radix' 2 
30 uses a simple butterfly structure. 

Figure 1 illustrates the resulting signal flow graph 
structure for N=16 (16 points FFT) , i.e. a received data 
stream to be subjected to FFT is assumed to comprise N=16 
35 samples (N samples forming one symbol) . Trivial 
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multiplications denoted by the multiplier ^^-j" appear 
between a first, BF I, and a ^second, BF II, stage of the 
SFG. At the first stage, a simple butterfly structure is 
used. Then, in the second stage, the same calculation 
5 process is realized- And additionally, the last N/4=4 
outputs of the first stage BFI are multiplied by - j . 
Assuming a complex number Z = R + j*I with R denoting the 
real component and I denoting the imaginary component, a 
multiplication by '"-j" will then lead to -j*Z = -j*R + I. 

10 Apparently, the real and imaginary parts are exchanged and 
the imaginary part is inverted in terms of the sign. 
Therefore, this multiplication is regarded as trivial 
(real-imaginary swapping and sign inversion) . These 
operations are indicated by diamonds symbols in Figure 1, 

15 After these two stages, full multipliers are required to 
compute the product of the decomposed twiddle factor- The 
multipliers perform a multiplication with multiplication 
factors W (twiddle factors) . Twiddle factors are those 
coefficients applied to results from a previous stage to 

20 combine these in order to form inputs of a next stage ^ 

Applying the Common Factor Algorithm, CFA, procedure 
recursively to the remaining DFT's (Discrete Fourier 
Transforms) of lengths N/4, the complete radix 2^ DIF FFT 

25 algorithm is obtained, as shown in Figure 2. As an 

explanatory remark,, using such an approach, a number of 
N=16 data sets (samples) of an incoming stream is 
decomposed in a pipeline fashion into a succession of 
stages logaN = 4. That is, for N=16 data samples, a 4 stage 

30 FFT SFG and/or architecture will result (totoal number of 
stages k=4 in this example) . A respective i-th stage 
(i=1...4) is designed to process a number of data sets of 
2(iog2N+i-i) ^ Thus, the first stage (i-1) BF I 
receives/processes 16 data samples, and the fourth stage 

35 (i=4) BF IV receives/processes 2 data samples. 
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Architecture 

In the following, the architecture will be described with 
5 reference to a DFT example for N=16 samples. 

As shown in Fig. 2, the FFT structure for N=16 data samples 
has four butterfly stages BFI, BFIV. Note that BFI, 
BF IV denote the stages and do not denote the BF types 

10 employed in a respective stage. There can be seen that the 
non-trivial multipliers are between the second, BFII, and 
the third stage, BFIII, according to the signal processing 
order- In addition, the rotations (trivial multiplications) 
by -j are done after the first, BFI, and after the third, 

15 BFIII, stage. Fig. 3 illustrates the resulting pipeline 
architecture. The blocks above the butterfly structures 
indicate FIFO memories and the numbers indicated therein 
the delay imposed thereby, i.e. number of samples buffered 
by these. 

20 

The FIFO memories are located in the single delay feedback 
path of the structure. FIFO memories are particularly 
. useful in terms of hardware, but the FIFO property could 
also be realized by another memory type in combination with 
25 appropriate addressing of the memory in order to read out 
the stored data in FIFO fashion. 

For instance, the FIFO in the first stage after the input 
.port has the length of 8 symbols. Apparently, the number of 

30 delay elements, i.e.. the number of samples buffered in the 
feedback path of a i-th stage out of k stages is N/2 for 
i=l, N/4 for i=2, N/8 for i=3, and N/16 for i=4, and can 
generally be expressed as N/2^ for an i-th stage. 
The data control for the butterflies is indicated by the 

35 bar on the bottom of the figure, which schematically 
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indicates control signals supplied to the four stages 1...4 
of the pipeline architecture. Butterfly stages of type I 
(BF2I) receive a single control* signal only and are applied 
in stages i==l and i=3, and Butterfly stages of type II 
5 (BF2II) receive two control signals and are applied in 
stages i=2 and i=4 . The twiddle factors W(n) are for 
example read out from a memory (not shown in Fig. 3) with 
appropriate timing. The timing of the control signals 
supplied to BF2I and BF2II stages as well as for twiddle 
10 factor generation/supply depends on the clock rate of the 
FFT device. 

The internal structure of the respective butterfly stage is 
shown in Fig. 4 (BF2I) and Fig. 5 (BF2II) , Note that input 

15 and output ports are divided into a real {index r) and 

imaginary (index i) part. N denotes the number of symbols 
contained in the stream to be subjected to FFT processing 
and n is an index variable with l<=n<=N. (The memory 
^^capacity" of e.g. the FIFO in the feedback path depends on 

20 the stage index i with l<=i<=k.) 

Fig. IIA and 12 show details of the data control in terms 
of control signals applied and timing relations there 
between, as will be described later on. 

25 

The calculation process at each stage is done in two steps. 

In the first step (control signal s = 0) , the data sequence 
x(n) (n=1..16/2) is read at the input ports 

30 Xr (n+N/2) /xi (n+N/2) and is directly written to the ports 

Zr (n+N/2) /Zi (n+N/2) which are connected to the FIFO. At the 
same time, the FIFO content is read at the ports 
Xr(n)/xi(n) and is directly written, as the other output 
port pair, to the ports Zr(n)/Zi(n) which are connected to 

35 the next pipeline stage. 
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In the second step (control signal s = 1) , after N/2=8 
symbols, , the stored data and the remaining input symbols 
x(n) (n=9..16) are used to compute the stage output where 
5 one half is written to the next stage (ports Zr(n)/Zi(n)) 
and the other half is stored in the FIFO memory (ports 
Zr (n+N/2) /Zi(n+N/2) ) • 

To accomplish such processing,, the internal structure uses 
10 adders/subtractors and internal signal feeding paths as 

shown in Fig. 4. In addition, supplying the signals to FIFO 
memory and/or next stage Butterfly stage is accomplished 
using switches under control of the control signal s. The 
operational condition of a respective switch is denoted by 
15 0 and/or 1 which represents the respective state of. the 

control signal s. applied in order for the switch to be in 
the respective operational condition. An adder is 
illustrated by the encircled , a subtracter is 
illustrated by the encircled ^^+'' with an additional 
20 subscript 

The calculation process of the butterfly stage BF2II 
differs from the one done in BF2I a little. Since these 
stages additionally include the j rotations, i.e. the 

25 ^'trivial" multiplications by ^^-j", the real and imaginary 

parts of input signals have to be swapped- In addition, the 
signs have also to be changed as shown in Fig. 5. This is 
controlled by the signal t. The negated signal t is 
logically combined in an AND gate with the signal s and 

30 controls the swapping paths at the input terminals 

xr(n+N/2), xi(n+N/2) as well as the adders/subtractors in 
the signal paths associated to the signals xi(n) and 
xi(n-l-N/2). Thus, for s=l and t=0 there occurs a swapping * 
and conversion of the adder, else there is no swapping and 
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conversion of the adder. The remaining process and 
architecture is equal to the BFI process. 

Fig. IIA shows details of control signals with a 
5 corresponding timing relation being illustrated in Fig. 12. 

As shown in Fig. IIA, a clock signal elk is supplied to the 
(FIFO) memory, a twiddle factor generation means (e.g. 
including a memory from which the factors are read out) and 

10 the BF2II stage. A signal supplied to the BF2II stage from 
a preceding stage is denoted with and signals s and t as 
explained before are also supplied. A signal leaving the 
BF2II stage to a subsequent multiplier is denoted with z 
and supplied to the multiplier for multiplication with a 

15 twiddle factor w. Afterwards, the multiplied signal is 

forwarded to the next stage (not shown in Fig. IIA) . (Note 
that substantially the same holds for a stage of type BF2 
I, with the difference that the control signal t is not 
applied and that a signal z leaving a stage of BF2I type 

20 will be supplied to a BF2II stage (input signal x) and not 
to multiplier performing multiplication with twiddle 
factors) . 

Fig. 12 shows the timing relation there between. 

25 In the lower part of Fig. 12, the signals z, w and elk are 
supplied in synchronism with each other. With each clock'" ' 
cycle elk, a new signal z is supplied to the multiplier 
which is in synchronism therewith supplied with a 
corresponding weight (twiddle) factor w. 

30. In the upper part of Fig. 12 it is shown that a sample x of 
a sequence of 1 ... N samples (forming one OFDM symbol) is 
supplied with each clock cycle elk. Initially, the signal s 
assumes a low level (s=0) for the first N/2 samples. 
Thereafter, starting with sample N/24-1, it assumes a high 

35 level until N samples have been supplied. (Thereafter^ a 
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new OFDM symbol sequence starts and s=0) . As to the signal- 
t, this signal, assumes a high level for the first 3*N/4 
samples and changes afterwards (starting with sample 
3/4*N+l) for the last N/4 samples to the low level. 

Finally, Table 1 shows the complexity of this prior art 
FFT architecture, which is used in the further development 
of the multi-stream transformation for MIMO-OFDM systems. 





Multiplier 


Adder 


Memory 
Size 


Control 


R2 2SDF 


L094 Nfft ~1 


4Lbg4 Nfft 


Nfft ~1 


Simple 



Table 1 : Computational Complexity of the FFT , 
FFT for MIMO Syistems 

NOW, two straightforward architecture alternatives are 
presented for MIMO systems based on this FFT structure. 
Notwithstanding this, other FFT structures could be used, 
in the following, the previously described FFT structure 

(R22SDF) is implemented for MIMO systems. There are two 
possible strategies to realize the transformation process 

for Mk antenna system, i.e. systems having a number of Mh 

antennas . 

Fig. 6 shows a full parallel implementation with a FFT 
block per each data stream to be transformed. Thus, on the 
one hand, a number Ma of FFT blocks can be implemented, 
i e. one for each stream (see Fig. 6 for the example of 
M,=4) . It can be seen that the complexity of such a system 
grows linearly with the /number of antennas (i.e. Mk times 
one FFT complexity) . 



wo 2007/003977 



PCT/IB2005/001867 



10 

On the other hand, to reduce the complexity of the system, 
the transformation process can be done successively by a 
smaller number (Mfft) of FFT blocks (straightforward 
successive FFT solution) . In order to transform 
successively Mr parallel streams, the FFT has (or the FFTs 
have) to work at a higher rate. Because of the used FFT 
pipeline structure, the frequency can be increased 
arbitrarily. 

Fig. 7 illustrates such a successive transformation process 
for Ma=4 and M^px =1, i.e. using a single FFT only. Due to 
this processing, the input streams are multiplexed upstream 
of the FFT using a multiplexer MUX and demultiplexed using 
a demultiplexer DeMUX after, i.e. downstream the FFT. This 
strategy results in a reduction of computational 
complexity, depending on the sharing ratio (Mr/Mfft) ■ 
Unfortunately, each stream requires an additional input 
buffer that collects one OFDM symbol before sending it to 
the FFT. 

Fig. 8 illustrates the timing of signal processing of this 
structure as shown in Fig. 7. In a first step, Nfpt symbols 
of each stream (example: number of streams Mr=4) are 
written to the corresponding stream buffer. Due to the Mr 
streams arriving in parallel, the Mr buffers are 
simultaneously getting filled. Finally, after the buffering 
period, each buffer successively shifts its content into 
the FFT block, which works at a higher rate. Since the 
buffer content of the streams is used sequentially and new 
data symbols are continuously fed to the FFT at the same 
time, another buffer (not shown) is needed. 

In a first buffer area I, samples of Mr data streams are 
buffered. Assuming a multiplexing sequence of Mr streams 
1...4, the samples of stream 1 are used as FFT input first. 
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In the meantime, further data samples of following symbols 
are buffered in a buffer area II for streams 2..A. Samples 
of stream 2 will be subjected to FFT processing next, which 
is the reason why buffer area II for stream 2 will not fill 
too much. Since streams 3 and 4, respectively, will be 
subjected to FFT processing pre-last or last, respectively, 
the respective buffer area II for these streams will be 
filled to a greater extent. The indication of multiples of 
NpFT indicate the additional amount of buffer memory 
required for buffer area II. 

The need and the size for the additional buffer area can 
also be seen at the time axis t in Fig. 8. At the time when 
the first sequence is fed into the FFT, the incoming values 
of the remaining sequences have to be buffered until the 
FFT block has finalized the input process for the first 
sequence. For the second sequence for Mr=4, the FFT is able 
to read the next sequence after N/Mr=0.2 5N time steps. This 
results in an absolute value of t=1.25N. For the 3^<^ and 4^"" 
sequences, the waiting or buffer time is 2N/Mr=0.5N 
(absolute: t=1.5N) and 3N/Mr=0.75N (absolute: t=1.75N). 
Consequently, the data input for all sequence is finalized 
after N time steps and at the time t=2N the next OFDM 
symbol period begins. 

Assuming an FFT processing rate of four^ times higher 
compared to the symbol rate, the additional memory size for 
buffering is 



^ 2 



FFT 
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In addition, the FFT uses a memory in the size of Nfft 1- 
Thus, the overall memory size (complex symbols) is given by 



Buffer! > ^ ^ ^ 



Buffer// 



For a system with four antennas (Mr=4) and one FFT (Mfft-D / 
the above equation can be simplified to 



Btifferll 'FFT 



For MIMO receivers with Mr antennas, Mr independent data 
symbol streams have to be transformed. Usually, according 
to the approach introduced with reference to Fig. 6, the 
data symbols are fed into Mr FFT blocks. Especially for 
large FFT length, this results in highly complex system 
architectures . 

As shown in the successive processing alternative 
introduced with reference to Figs. 7 and 8,. there is a 
possibility to reduce the architecture complexity up to a 
complexity of one FFT. Unfortunately, the memory 
consumption of this option, increases from 4Nfet. -4 (parallel 
FFTs solution) to 6.5Nfft-1 complex symbols. 

SUMMARY OF THE INVENTION 

Hence, it is an object of the present invention to provide 
an improved signal processor for FFT transformation as well 
as a corresponding method which is free from above 
mentioned drawbacks inherent to known approaches. 
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According to the present invention, this object is for 
example achieved by 

a signal processor for Fast Fourier Transformation, FFT, of 
5 Mr, Mr > 1, input data streams supplied in parallel, 

comprising a multiplexing device having Mr input terminals 
each receiving one of the Mr input data streams and an 
output terminal at which the Mr input data streams are 
output in a multiplexed manner, a Fast Fourier 

10 Transformation device configured to perform Fast Fourier 
Transformation of a data stream supplied at an input 
terminal thereof and to output the FFT transformed data 
stream at an output terminal thereof, the input terminal of 
the Fast Fourier Transformation device being connected to 

15 the output terminal of the multiplexing device, and a 

demultiplexing device having an input terminal connected to 
the output terminal of the Fast Fourier Transformation 
device and Mr output terminals at which a respective one of 
Mr transformed output data streams is output in a 

20 demultiplexed manner, characterized in that each of the Mr 
input data streams contains a number of N=2^ samples, the 
Fast Fourier Transformation device has a pipeline 
architecture composed of k stages with a respective 
feedback path including a single delay element per each 

25 stage of the pipeline architecture and is controlled by a 
first and second internal control signals, wherein the 
delay element in a feedback path of an i^^ stage, l<=i<=k, 
of the pipeline architecture imposes a delay of Mr*N/2^ 
samples, the first internal control signal is clocked Mr 

30 times faster compared to a clock rate at which the samples 
of the Mr streams are supplied, and the second internal 
control signals are clocked Mr times slower compared to the 
first internal control signal* 
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According to advantageous further developments of the 
signal processor, 

- the multiplexing devicef i3 configured such that the 
Mr input data streams are multiplexed per data sample of 

5 the input data streams and the demultiplexing device 

(DEMUX) is configured such that the transformed input data 
stream is demultiplexed per data sample of the transformed 
data stream; 

- a control signal supplied to the multiplexer and 

10 demultiplexer is clocked at a rate Mr times the clock rate 
of the supplied streams; 

- the Fast Fourier Transformation device (FFT) has a 
Radix-2 Single-path Delay Feedback, R^SDF, architecture; 

- the pipeline architecture of the Fast Fourier 

15 Transformation device is composed of Butterfly stages of 
types I and II; 

- the first stage of the pipeline architecture 
receiving the multiplexed data streams is a Butterfly stage 
of type I for even and odd total niombers of k. 

20 

According to the present invention, further a network 
element of a communication network comprising a signal 
processor according to any of the preceding aspects is 
concerned. 

25 

According to the present invention, further a terminal 
configured to communicate via a communication network, the 
terminal comprising a signal processor according to any of 
the preceding aspects is concerned. 

30 

Still further, according to the present invention, a system 
comprising at least one of a terminal according to any of 
the above aspects and a network element according to any of 
the above aspects is concerned. 
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Also, according to the present invention, a computer chip 
. comprising at least a signal processor according to any of 
the preceding aspects is concerned. 

5 According to the present invention, this object is for 
example achieved by 

a signal processing method for performing Fast Fourier 
Transformation, FFT, of Mr, Mr > 1, input data streams 
supplied in parallel, comprising the steps of multiplexing 

10 the Mr input data streams to a multiplexed data stream, 

performing Fast Fourier Transformation of the multiplexed 
data stream and outputting the transformed data stream, 
demultiplexing the transformed data stream to Mr 
transformed output data streams, characterized by each of 

15 the Mr input data streams contains a number of 

. samples, performing FFT transformation using a pipeline of 
k stages with a respective feedback path imposing a delay 
on the samples per each stage of the pipeline and 
controlling the performing of the FFT transformation by a 

20 first and second internal control signals, and by imposing 
a delay of Mr*N/2^ samples on the samples in the feedback 
path of an i^^ stage, l<=i<=k, of the pipeline, clocking 
the first internal control signal Mr times faster compared 
to a clock rate at which the samples of the Mr streams are 

25 supplied, and clocking the second internal control signals 
Mr times slower compared to the first internal control 
signal . 

According to advantageous further developments of the 

30 signal processing method, 

- multiplexing is accomplished such that the Mr input 
data streams are multiplexed per data sample of the input 
data streams and demultiplexing is accomplished such that 
the transformed data stream is demultiplexed per data 

35 sample of the transformed data stream; 
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- clocking to the multiplexer and demultiplexer is 
performed at a rate Mr times the clock rate of the 
supplied streams; 

- the Fast Fourier Transformation processing is based 
on a Radix-2 Single-path Delay Feedback algorithm; 

- the pipeline of processing stages for the Fast 
Fourier Transformation is composed of Butterfly stages of 
types I and II (BF2I, BF2II) ; 

- the first stage of the pipeline receiving the 
multiplexed data stream is a Butterfly stage of type I for 
even and odd total numbers of k. 

Still further, according to the present invention, a 
computer program product for a computer, comprising 
15 software code portions for performing the steps of any one 
of the above method aspects when the program is run on the 
computer is concerned. 

in this regard/ the computer program product advantageously 
comprises a computer-readable medium on which the software 
20 code portions are stored* 

According to the present invention, at least the following 
advantages can be achieved compared to pre-existing 
concepts : 

25 

The present invention concentrates on the Fast-Fourier 
transformation in MIMO-OFDM systems. The proposed FFT- 
structure and method enables a transformation process of 
several incoming data streams in parallel. 

30 

However, the present invention is not limited to OFDM 
systems but can be applied to other scenarios in which 
parallel input data streams are to be subjected to FFT. For 
example, it can be applied for frequency domain filtering 
35 at multiple antenna receiver or transmitter. For example. 



5 



10 
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as examples of OFDM systems , it can be applied to WLAN 
systems or other communication systems such as those 
currently investigated and referred to as 3.9G and 4G radio 
communication systems - 

•5 

The new multi-stream FFT structure offers a reduction of 
the computational complexity up to one FFT for all parallel 
data streams. On the contrary to the above introduced 
successive implementation, this strategy requires less 
10 memory (4Nfft ~4 complex symbols) at same computational 
complexity. 

The proposed architecture combines the optimum properties 
of parallel and straightforward successive multi-stream 

15 FFT- The proposed architecture/method has the same 

computational complexity as the straightforward successive 
FFT solution. Thus, the gain is equal to the number of 
parallel streams (Mr) compared to the parallel solution. It 
has the same memory consumption as the parallel FFT 

20 solution. The difference to the straightforward successive 
solution is more than 2 . SNfft complex symbol memory. The 
lower complexity results in lower costs. It can be realized 
with very little control ^^overhead" by merely adjusting 
buffer capacity in the feedback paths and adjustment of 

25 timing for the control signals. 

The significant reduction of the number of FFT blocks 
results in a corresponding reduction of cost for MIMO 
systems. Thereby, about 1/3 of memory reduction compared to 
30 a successive implementation using R2^SDF pipeline 

architecture becomes possible by improved data processing 
timing and feedback path delay adjustment. 



wo 2007/003977 



PCT/IB2005/001867 



18 

The concept underlying the present invention can be applied 
to all SDF pipeline FFT architectures with feedback delay 
elements in the single delay feedback path. 

Together with an increased processing rate of the FFT a 
slight increase in power consumption is to be expected, if 
the FFT is for example implemented in CMOS technology. 
However, the particular hardware realization is not limited 
to CMOS, but other technology concepts known for 
implementing digital circuits are likewise applicable. 

Brief description of the drawings 

The present invention will be described with reference to 
the accompanying drawings in which 

Fig. 1 shows a signal flow graph of a Butterfly structure 
with decomposed twiddle factors; 

Fig. 2 shows a Radix 2^ DIF FFT signal flow graph for N=16 
samples; 

Fig. 3 shows a Radix 2^ SDF pipeline FFT architecture for 
N=16 samples; 

Fig- 4 shows an internal structure of a Butterfly stage of 
first type, BF2I, with signals input thereto being divided 
into real and imaginary part; 

Fig. 5 shows an internal structure of a Butterfly stage, of 
second type, BF2II, with signals input thereto being 
divided into real and imaginary part; 

Fig. 6 shows a block circuit illustration of a parallel 
symbol FFT transformation architecture; 
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Fig. 7 shows a block circuit illustration of a successive 
symbol FFT transformation architecture; 

5 Fig. 8 shows a timing diagram for the successive FFT 
transformation architecture of Fig. 7. Note that this 
diagram shows the timing for the first stage for the input 
signal of the FFT length N only. However, the timing for 
the following butterfly stages can be derived based on the 
10 timing of the first stage. For this reason, according to 
the stage i, the N value has to be adopted to N=2*'"*^"^'; 

Fig. 9 shows a block circuit illustration of an embodiment 
of a multi-stream FFT architecture, as applicable for 
15 example to a 4 antenna MIMO receiver; and 

Fig. 10 shows a basic timing diagram for the FFT 
architecture according to the embodiment shown in Fig. 9. 
Note that this diagram shows the timing for the first stage 
20 for the input signal of the FFT length N only. However, the 
timing for the following butterfly stages can be derived 
based on the timing of the first stage. For this reason, 
according to the stage i, the N value has to be adopted to 
N=2^-^^-^>; 

25 ' 

Fig. IIA and IIB show details of the data control in terms 
of control signals applied to a butterfly stage of type 
BF2II according to prior art (Fig. IIA) and the present 
invention (Fig- IIB) , respectively; 

30 

Fig. 12 shows details of timing relations between the 
control signals shown in Fig. IIA and applied according to 
the prior art; 
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Fig. 13 shows details of timing relations between the 
control signals shown in Fig. IIB and applied according to 
the present invention; 

5 Fig. 14A shows a block circuit diagram of a control module 
according to the present invention, and 

Fig. 14B shows a block circuit diagram of a modification of 
a control module according to the present invention; 

10 FIG. 15 shows parts of a system comprising at least one 
terminal and at least one network element each of which 
incorporates the FFT according to the present invention. 

DETAILED DESCRIPTION OF THE PRESENT INVENTION 

15 

According to the present invention, basically, in N-by-MR 
MIMO systems, there are Mr data input streams in 
parallel, -(Note that this means here an N transmit and Mr 
receive antenna system and N is not equal to the number N 

20 of symbol samples to be subjected to FFT processing) . For 
this reason, an FFT architecture is also implemented which 
is able to process several data streams simultaneously at a 
rate Mr times the sample rate (of the individual data 
stream). (This means, a clock signal elk' supplied to an 

25 arrangement according to the present invention is Mr times 
the elk signal applied to the prior art arrangement in 
terms of frequency and 1/MR times in terms of period.) 

Fig. 9 illustrates an FFT architecture for Mr=4 parallel 
30 data streams and Fig. 10 shows the basic timing of the 
signal processing, according to the present invention. 

In the first step of the process, the Mr (Mr=4) data 
streams xi(n), X2(n), X3(n) and X4(n) are multiplexed to a 
35 single stream X (n) that is directly fed to the FFT pipeline 
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processor. For this reason, there is no need to introduce 
any input buffer, which would have at least a size of Mr 
times of the number N of data samples to be subjected to 
FFT transformation. (N is also referred to as "FFT 
length" . ) 

For the transformation of the input x'fn), the known 
architecture, according to the present invention, is 
modified in respect of the subsequently outlined aspects. 
Due to the four-fold amount of data (generally, Mr fold) at 
each stage, the FIFO memory size in the feedback path of 
each stage is extended by factor four (generally Mr). In 
addition, since the same twiddle factors are used for each 
of the four streams, the twiddle factors change four times 
slower compared to the single stream FFT. 

This means that the. simple multipliers are maintained 
active Mr times longer and also the factors W(n) are 
applied Mr times longer. 

Finally, the transformed data streams contained in an FFT 
output stream X(k) are demultiplexed corresponding to the 
multiplexing at the beginning of the FFT. 

The overall memory size is MrCNfet -1). Comparing the before 
described successive architecture, this approach requires a 
significantly smaller memory size. Because of the 
interleaved data processing within the FFT, there is no 
need for buffering of the FFT inputs. 

Table 2 shows the comparison of the successive multi-stream 
FFTs. It can be seen that the new architecture reduces the 
memory size of above 2 . SNfft . complex symbols at same 
computational complexity. 
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Straightforward Successive 
Multi-stream FFT 
Mr =4 Mfft =1 


Successive Multi-stream 
FFT acc. to invention 
Mr =4 Mfft =1 


6 . SNppT —1 


4Nfft -4 



Table 2: Memory consumption of the successive alternative 
multi-stream FFTs. 



Fig. 9 thus shows a signal processor for Fast Fourier 
5 Transformation, FFT, of Mr, Mr > 1, input data streams 
Xi(n), In the example shown, Mr=4, so that input data 
streams xi{n), X4(n) are supplied in parallel. The data 
streams are fed to a multiplexing device MUX having Mr 
(here Mr=4) input terminals each receiving one of the Mr 

10 input data streams xl (n) , x4 (n) . At an output terminal 
x' (n) of the multiplexing device, the Mr input data streams 
are output in a multiplexed manner, the multiplexed output 
represents an interlaced (or interleaved) output of the Mr 
data streams, i.e. data samples of Mr streams are 

15 alternatingly output. 

The thus obtained interlaced and/or multiplexed output data 
stream x' (n) is fed to a Fast Fourier Transformation device 
FFT. The FFT device is configured to perform Fast Fourier 

20 Transformation of a data stream x' (n) supplied at an input 
terminal thereof and to output the FFT transformed data 
stream at an output terminal X(]c) thereof. Thus, the input 
terminal of the Fast Fourier Transformation device FFT is 
connected to the output terminal x' (n) of the multiplexing 

25 device MUX- The signal processor further comprises a 
demultiplexing device DEMUX having an input terminal 
connected to the output terminal X(k) of the Fast Fourier • 
Transformation device FFT. At Mr output terminals Xl(k), 
X4(k) a respective one of Mr transformed output data 

30 streams is output in a demultiplexed manner. (Note that 
x(n) denotes the input signal in the non-FFT transformed 
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domain whereas X(k) denotes the resulting signal in the FFT 
transformed domain. In particular, k of X(k) is distinct 
from "k" used in connection with , identifying the stages of 
an FFT applied.) 

According to the present invention, such a FFT device is 
designed for each of the Mr input data streams containing a 
number of N=2'' samples. Further, the Fast Fourier 
Transformation device FFT has a pipeline architecture 
composed of k stages with a respective feedback path 
including a single delay element per each .stage of the 
pipeline architecture and is controlled by internal control 
signals elk', s, t, and w (not all individually shown in 
Fig. 9). The clock signal elk' is denoted as first control 
signal, and control signals s' , t' , w' are denoted as 
second control signals. 

According to the present invention, the delay element in a. 
feedback path of an i'^*' stage, l<=i<=k, of the pipeline 
architecture imposes a delay of Mr*N/2^ samples, first 
internal control signal elk' is clocked Mr times faster 
compared to a supply rate /clock rate of the supplied Mr 
sreams, and the second internal control signals s' , t' , w' 
are clocked Mr times slower compared to the clock rate elk' 
at which the FFT is operating. . 

In particular, the multiplexing device MUX is configured 
such that the Mr input data streams are multiplexed per 
data sample of the input data streams (interlaced) and the 
demultiplexing device (DEMUX) is configured such that the 
transformed input data stream is demultiplexed per data 
sample of the transformed data stream (de-interlaced) . 



A control signal (not shown) supplied to the multiplexer 
and demultiplexer is clocked at a rate of MR*clk, which 
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means that it is operated at Mr times the clock rate elk / 
sample rate of the input data streams.. 

In a particular advantageous embodiment of the present 
5 invention, the Fast Fourier Transformation device FFT has a 

Radix-2 Single-path Delay Feedback. R-SDF, architecture. 

Also, the FFT device is clocked Mr times faster than the 

sample rate elk of an individual data stream of N samples. 

In connection with an R2^SDF FFT device, the pipeline 
10 architecture of the Fast Fourier Transformation device is 

composed of Butterfly stages of types I and II (BF2I, 

BF2II) . 

In such a case, the first (input) stage of the pipeline 
architecture receiving the multiplexed data streams is a 
15 Butterfly stage of type. I for even and odd total numbers of 
stages. The internal structure and operation of BF2I and 
BF2II stages is as shown in Figs. 4 and 5, and only the 
timing of the control signals are different in connection 
with the present invention. 

20 

Fig. IIB shows details of control signals with a 
corresponding timing relation being illustrated in Fig. 13. 
Fig. IIB is substantially identical to Fig. IIA except that 
the control signals are denoted in addition with an 
25 apostrophe to make clear that the control signals applied 
according to the present invention differ in the timing 
from those applied in the prior art arrangement. 

Fig* 13 shows the timing relation there between. 

30 In the lower part of Fig. 13, the signals z' , w' and elk' 
are supplied. With each clock cycle elk', a new signal z' 
is supplied to the multiplier which is supplied with a 
corresponding weight (twiddle) factor w' which changes but 
after Mr cycles of elk' . In the upper part of Fig. 13 it is 

35 shown that a sample x' of a repective one out of Mr 
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sequences of 1... N samples each (forming one OFDM symbol) is 
supplied with each- clock cycle elk' in a multiplexed 
(interlaced) manner. Initially, the signal s' assumes a low 
level (s'=0) for the first Mr*N/2 samples. Thereafter, 
starting with the interlacing of sample Mr*N/2+1, it 
assumes a high level until Mr*N samples of all streams of a 
symbol have been supplied. (Thereafter, a new OFDM symbol 
sequence starts with s'=0). As to the signal t' , this 
signal assumes a high level for the first Mk*3*N/4 samples 
and changes afterwards (starting with interlacing of 
samples 3*N/4+l) for the last Mr*N/4 samples to the low 
level. 

Thus, the second internal" FFT control signals s' , t' , w' 
are clocked Mr times slower compared to the clock rate elk' 
at which the FFT is operating, and the clock rate elk' at 
which the FFT is operating is MR times faster than the 
clock rate elk at which the samples of the MR streams are 
supplied, speeding. the eloek rate elk' at which the FFT. 
device operates by a factor Mr adjusts the FFT clock rate 
to the number Mr of externally supplied data streams, and 
slowing the control signals s' , t' , w' down by a factor Mr 
compensates for this by adjusting the other internal 
control signals of the FFT to the new clock rate elk' at 
which the FFT is operating. 

AS mentioned beforehand, it is to be noted that this 
diagram shows the timing for the first stage for the input 
signal of the FFT of length N only. However, the timing for 
the following butterfly stages can be derived based on the 
timing of the first stage. For this reason, according to 
the stage i, the N value (based on which the timing is 
indicated) has to be adopted to N=2'''"-~^'. 
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Fig. 14A shows a block circuit diagram of a control module 
according to the present invention. As illustrated, a clock 
rate elk of the Mr supplied streams is supplied to the 
control module as well as an information on Mr as such. 

5 Both of these can be fixedly configured to the FFT device, 
or informed to the device during lifetime. In a first 
frequency division block, the first internal control signal 
of the FFT device elk' is generated by such that the first 
internal control signal (elk') is clocked Mr times faster 

10 compared to a clock rate (elk) at which the samples of the 
Mr streams are supplied. This first internal control signal 
is supplied to a control signal generation block of the FFT 
device. Based on the supplied clock signal, .second internal 
control signals s, t, and w are generated, basically in the 

15 manner as known from the prior art for controlling the 

pipeline FFT architecture as described herein before, i.e 
based on the number of clock cycles/samples of a single 
stream processed. The first internal control signal elk' is 
also passed to the pipeline architecture. 

20 

However, due to those (intermediate) second internal 
control signals s, t, and w being generated based on elk' , 
the increased frequency thereof is to be compensated. This 
is accomplished by a second frequency divider block. The 

25 (intermediate) second internal control signals s, t, and w 
are supplied thereto as well as the indication of Mr, and 
an output of the second internal control signals s' , t' , 
and w' is generated such that the second internal control 
signals (s', t' , w' ) are Mr times slower compared to the 

30 first internal control signal (elk' ).. Then, also the 
signals s' , t' , w' are supplied to the FFT pipeline 
architecture. 

Fig. 14B shows a block circuit diagram of a modification of 
35 a control module according to the present invention. The 
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indication of Mr streams to be processed is supplied to a 
FIFO Control block, where a memory control signal MEM_CTRL 
is generated therefrom. The signal MEM_CTRL is then 
supplied to the control section of e.g. a FIFO memory or 
any other memory having FIFO capabilities within a feedback 
path of a respective stage of the FFT pipeline structure . 
As described above, according to the present invention, a 
memory (e.g. FIFO) in a feedback path of the FFT pipeline 
imposes a delay of Mr*N/2^ samples on the samples in the 
feedback path of an i^^ stage, l<=i<=k. This is based on 
the assumption of a fixed number of Mr streams to be 
processed which is known beforehand, i.e. at FFT device 
production. 

Fig. 14B now illustrates an example in which a FIFO or any 
other memory is composed of a number of j=l...MRinax memory 
cells, each comprising memory locations for data 

samples .to be buffered. By virtue of the control signal 
MEM_CTRL, a number of Mr=x cells can be selected to be 
actively used in the FIFO. Hence, data supplied at clock 
rate elk' are output in a FIFO manner after Mr-x memory 
cells. This can be regarded as a FIFO than can be ^^tapped" 
dependent on the control signal MEM_CTRL. Such feature 
provides for increased flexibility of application of the 
FFT structure in various environments, including SISO 
(Mr^D as well as MIMO applications (MR=2...MRmax) • The 
parameter Mr could be configured upon installation of the 
FFT device, or" could be transmitted in a special signal 
(e.g. broadcast signal) and then detected at the FFT device 
for self-configuration (or self-reconfiguration) of the 
device. The only additional memory requirement would reside 
in the feedback paths, but no buffers as discussed in 
connection with the approach shown in Fig. 7 and 8 are 
needed. 
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A signal processor according to any of the preceding 
described aspects can advantageously form part of a network 
element of a communication network. Still further, a signal 
processor according to any of the preceding described 
aspects can advantageously form part of a terminal 
configured to communicate via a communication network. 
Hence, the present invention also addresses a system 
comprising at least one such a terminal and at least one 
such network element, as shown in outline in Fig. 15. Fig. 
15 shows an FFT according to the present invention being 
implemented in a MIMO OFDM system comprising a Node_B as a 
network element and a user equipment UE as a terminal. As 
illustrated by the four (Mr=4) arrows, these communicate in 
a MIMO scenario and in the illustrated example system, each 
of them includes an FFT according to the present invention. 
(Details of the FFT can be found in the respective other 
figures of this application. Note that other components of 
a terminal and a network element are not shown as they are 
not essential for the present invention.) 
Hereinbefore, the present invention has mainly been 
described with reference to a hardware implementation as 
e.g. usable in an ASIC (Application Specific Integrated 
Circuit) or DSP (Digital Signal processor) . The signal 
processor can also be a signal processing device 
implemented as a chip in semiconductor technology such as 
CMOS, BiCMOS, or any other. 

For a specific implementation of the invention, it is not 
considered essential whether the. invention is embodied as a 
chip, as a signal processor device or as software code 
portions as all these implementations are equally well 
applicable and chosen according to the circumstances under 
which the present invention is to be carried out. Thus, 
whether a terminal or network element embodies the 
invention as software code portion or as a chip or as a 
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signal processor device is not in the focus of the present 
application. 

Nevertheless, the present invention may also be carried out 
in terms of a signal processing method as software code 
portions running on a processor, or stored on a storage 
medium and thus adapted to carry out the method when run on 
a processor. 

in this regard, its is to be understood that the present 
invention concerns a signal processing method for 
performing Fast Fourier Transformation, FFT, of Mr, Mr > 1, 
input data streams (xl (n) , Xm« (n) ) supplied in 
parallel, comprising the steps of multiplexing the Mr input 
data streams (xl(n) , x^ (n) ) to a multiplexed data 
stream, performing Fast Fourier Transformation of the 
multiplexed data stream and outputting the transformed data 
. stream, demultiplexing the transformed data stream to Mr 
transformed output data streams, characterized by each of 
the Mr input data streams contains a number of N-Z"^ 
samples, performing FFT transformation using a pipeline of 
k stages with a respective feedback path imposing a delay 
on the samples per each stage of the pipeline and 
controlling the performing of the FFT transformation by^a 
first (elk') and second internal control signals (s' , t' , 
w' ) , and by imposing a delay of Mr*N/2^ samples on the 
samples in the feedback path of an i^>^ stage, l<=i<=k, of 
the pipeline, clocking the first internal control signal 
(elk') Mr times faster compared to a clock rate (elk) at 
which the samples of the Mr streams are supplied, and 
clocking the second internal control signals (s' , t' , w' ) 
Mr times slower compared to the first internal control 
signal (elk' ) . 
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under the aspect of the method, multiplexing is 
accomplished such that the input data, streams are 
multiplexed per data sample of the input data streams and 
demultiplexing is accomplished such that the transformed 
data stream is demultiplexed per data sample of the 
transformed data stream. Clocking to the multiplexer and 
demultiplexer is performed at a rate of M.*N, i.e. times 
the sample rate of an individual data stream. The Fast 
Fourier Transformation processing is based on a Radix 
Single-path Delay Feedback algorithm, wherein the pipeline 
of processing stages for the Fast Fourier Transformation, is 
composed of Butterfly stages of types I and II (BF2I, 
BF2II) . 

in thi= connection, the first of k stages- of the pipeline 
receiving the multiplexed date stream is a Butterfly stage 
of type I for even and odd total numbers of k. 

Accordingly, as has been described herein above, the 
present invention proposes a signal processor for Fast 
Fourier Transformation, FFT. of M., M, > L inP-t data 
streams of 2« samples each, supplied in parallel. After 
multiplexing the input data 3««»s in an interlaced 
manner, the resulting stream is ^-^'^^'^^ 'U^J^ 
device has a pipeline architecture composed of " 
with a respective feedback path including ' ^^^^J 
element per each stage of the pipeline architecture. The 
delay element and timing signals are adapted to cope wrth 
Fra processing of the multipXeKed streams us.ng the srngle 
FFT device only. After processing, the FFT processed data 
Stream is demultiplexed. 

Although the invention has been described in -""^^ 
particular embodiments, various modif icatrons are possxble 
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without departing from the scope and spirit of the 
invention as defined by the appended claims. 

It should be appreciated that whilst embodiments of the 
5 present invention have mainly been described in relation to 
mobile communication devices such as mobile stations, 
embodiments of the present invention may be applicable to 
other types of communication devices that may access 
communication networks. Furthermore, embodiments may be 
10 applicable to other appropriate communication systems, even 
if reference has. mainly been made to mobile communication 
systems. 

List of abbreviations: 



OFDM 


Ort-hogonal Frequency Division Multiplex 


SISO 


Single Input Single Output 


MIMO 


Multiple Input Multiple Output 


FFT 


Fast Fourier Transformation 


BF 


Butterfly 


CFA 


Common Factor Algorithm 


DIF 


Decimation-In-Frequency 


SFG 


Signal Flow Graph 


SDF 


Single-Path Delay Feedback 



